Using [semi-supervised methods][semi] described in the documentaton. Label propagation basically involves trying to add labels to the test data based on the labels in the training data.
In [1]:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
plt.rcParams['figure.figsize'] = 8, 12
plt.rcParams['axes.grid'] = True
plt.set_cmap('brg')
In [2]:
cd ..
In [3]:
from python import utils
In [4]:
with open("settings/testing_labelprop.json") as fh:
settings = utils.json.load(fh)
In [5]:
with open("segmentMetadata.json") as fh:
meta = utils.json.load(fh)
In [6]:
data = utils.get_data(settings)
In [8]:
da = utils.DataAssembler(settings,data,meta)
Then we just need to build training sets for each subject and apply the relevant models. Unfortunately, the cross-validator doesn't handle test segments so we won't be able to run any informative cross-validation.
In [15]:
import sklearn.ensemble
import sklearn.preprocessing
import sklearn.semi_supervised
In [19]:
scaler = sklearn.preprocessing.StandardScaler()
selector = sklearn.ensemble.ExtraTreesClassifier(n_estimators=1000)
classifier = sklearn.semi_supervised.LabelPropagation()
In [23]:
predictions = {}
for subject in settings['SUBJECTS']:
print("Processing " +subject)
Xtrain,ytrain = da.build_training(subject)
Xtest = da.build_test(subject)
X = np.vstack([Xtrain,Xtest])
y = np.hstack([ytrain,np.array([-1.0]*Xtest.shape[0])])
print("Fitting ExtraTree feature selection.")
# then we want to fit preprocess the data
X = scaler.fit_transform(X)
selector.fit(Xtrain,ytrain)
print("Applying ExtraTree feature selection.")
X = selector.transform(X)
print("Fitting classifier.")
# then fit the classifier
classifier.fit(X,y)
print("Classifying test data.")
# then classify the test set
predictions[subject] = classifier.predict_proba(X)[:Xtrain.shape[0],:]
break
In [24]:
predictions
Out[24]:
Unsure why that is happening, could be there is an assumption of the label propagation I'm unaware of that is causing problems.